Skip to content

[codex] Fix recurrent cache resize before checkpoint restore#50

Merged
spiritbuun merged 1 commit into
spiritbuun:masterfrom
Nowayz:codex/fix-recurrent-checkpoint-restore
May 17, 2026
Merged

[codex] Fix recurrent cache resize before checkpoint restore#50
spiritbuun merged 1 commit into
spiritbuun:masterfrom
Nowayz:codex/fix-recurrent-checkpoint-restore

Conversation

@Nowayz
Copy link
Copy Markdown

@Nowayz Nowayz commented May 9, 2026

Summary

Recurrent backup cells are expanded for speculative decoding, but the server does not shrink them back before later prompt-cache/checkpoint restore. That means checkpoint restore is happening against a different recurrent-cache topology than the one used when the checkpoint was made.

Fixes #49.

Code Changes

  • Add a server prefill shrink step before prompt-cache save/load so recurrent memory returns to the non-speculative topology once draft backup cells are no longer active.
  • Remove speculative backup sequences before shrinking recurrent memory.
  • Skip shrinking while a slot is processing or still owns a draft backup.
  • Sanitize recurrent cell metadata after resize by dropping invalid sequence ids, clearing stale source rows, rebuilding tail pointers, and clamping cached range metadata.
  • Guard recurrent source-row lookup so invalid metadata falls back to the zero state instead of reaching backend GET_ROWS with an invalid row.

Validation

  • Built llama-server in a Windows CUDA Release build.
  • Checked the branch diff to confirm only the recurrent-cache fix files are included.

@spiritbuun spiritbuun marked this pull request as ready for review May 17, 2026 00:42
@spiritbuun spiritbuun merged commit ea097bb into spiritbuun:master May 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Context checkpoint restore can use expanded recurrent cache topology after speculative decode

2 participants